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Optical remote sensing images have been widely used for temporal 
monitoring. The data is acquired by sensors on satellites with better spatial 
resolution compared to in-situ measurements by meteorological stations. The 
problem with utilizing optical images is the cloud, which blocks the ground 
and near-ground information collected by satellites. To overcome this 
problem, especially when dealing with thermal bands, we propose a 
procedure including aggregation and spatial interpolation methods to obtain 
time series data over a region. There is still no reference to selecting the data 
period to calculate the aggregate value and apply spatial interpolation. An 
assessment is proposed by applying Yamane’s formula in the time domain 
and thresholding the number of pixels in the spatial domain. Himawari-8 
data was utilized and collected on an hourly basis over Java Island. This 
algorithm is applied to a sequence of periodic datasets to obtain a time series 
of aggregate data for meteorological applications. The result of this study is a 
recommendation to use three-month periods of data over the eastern part of 
Java. 
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1. INTRODUCTION 


Time series data of dynamic parameters of the earth's environment are commonly used in 
meteorological applications, for example in climate change monitoring [1]-[3]. For meteorological 
applications, in-situ data from measurement stations is preferred in many cases to be used as data sources. 
There will be no problem getting temporal data from meteorological stations. On the other hand, the data 
does not have good spatial resolution because of the limited number of stations [4]. 

The utilization of optical remote sensing satellite images has many advantages. The development of 
its technology is moving forward with continuous development from a lot of providers. The data itself covers 
the whole world with high spatial resolution and high temporal resolution. The main problem is cloud 
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existence [5], which has a high occurrence in Indonesia, although different land areas have different yearly 
patterns [6]. The clouds will block solar radiation and absorb radiation emitted from the ground. The clouds 
hide information from the ground and near-ground collected by satellites. To use data from optical satellites, 
the availability of data should be examined carefully. 

When the data is used for land monitoring with little change, mosaic-ing of reflectance bands can be 
done to remove the clouds [7]. Selected temporal data can also be used directly to monitor land use and land 
cover change [8], [9]. Aggregation can also be calculated directly, just like the composite vegetation index 
[10], [11]. When dynamic parameters like weather parameters or any brightness temperature in thermal bands 
related to weather parameters are observed, the method will be different. 

Using remote sensing optical data, reconstruction of brightness temperature is usually proposed. There 
are several ways to get complete spatial-temporal data. Neteler [12] uses spatial-based reconstruction. The 
method simulates missing values up to 75%. Aggregation can be performed well right after the interpolation, 
but the time cost is high for the interpolation process on each image. Spatial-temporal-based reconstruction is 
more popular. Spatiotemporal kriging is used in [13], but the completeness of each image is not always 100%. 
He et al. [14] uses Bayesian reconstruction with good accuracy, but the time cost is high for a larger-sized 
image. Temporal-based is another option that shows good results. Zhou et al. [15] can be applied to land 
parameters like land surface temperature with the requirement that there is no abrupt change in the value. This 
method was approved with a small missing value of up to 21.7%. Therefore, it is difficult to implement the 
method in Indonesia. Luo et al. [16] uses generative adversarial networks to generate samples for training, 
based on recurrent neural networks applied to meteorological data. Some data was removed to simulate missing 
information. The method can be applied to satellite data, but the time cost is high because the training operation 
is done pixel by pixel. The main problem with all these methods is the time cost. In this research, we propose a 
procedure to generate time series data that can reduce processing time. Instead of trying to reconstruct to get 
complete data, this research proposes calculating aggregate data for each pixel location in the time domain 
before performing spatial interpolation. To do that, we propose a procedure of assessment based on the 
availability of data as the novelty of this research. The assessment is directed at finding the minimum period of 
data applied to Himawari-8 data to create statistical information via aggregation, which can be spatially 
interpolated. The period should be repeated to create time series information. 

A thermal band containing brightness temperature will be used in the experiment. The interpolation 
method will be evaluated before being used, and spatial correlation between the result and the related 
physical parameter will be calculated. Evaluation of the aggregate value of brightness temperature is not in 
the scope of this research since it is difficult to find other data for comparison. 


2. METHOD 

In this study, the data was limited to a one-year period, and Java Island was selected as the area of 
interest. There are two main requirements before applying aggregation and spatial interpolation, namely the 
amount of available data and its distribution pattern. Homogeneous areas may have a smaller number of 
samples, but there are still no exact criteria for the distribution pattern in this case. Therefore, the focus of 
this study is primarily on the availability of data. 

Images from the Himawari-8 satellite were used because of their high temporal resolution. The 
imager is onboard a geostationary satellite and covers the western Pacific region, including the whole 
Indonesian area. The period of mid-2019 to mid-2020 was selected to simplify the analysis, which was not 
much affected by complex weather like El Nifio and La Nifia. The data was processed to level 2 high- 
resolution cloud analysis information (HCAI), which contains cloud classification information. HCAI data 
was distributed by the Indonesian Meteorological, Climatological, and Geophysical Agency on an hourly 
basis. There is some lost data, but the percentage is very low. An algorithm based on the Fundamental Cloud 
Product and brightness temperature values from Himawari-8 images is used to generate HCAI and 
distinguish cloud cover as clear, cumulonimbus, dense cloud, upper cloud, middle cloud, cumulus, 
stratocumulus, or stratus/fog [17]. 

This research performed two kinds of tests: first in the time domain to find any pixel location with 
the proper aggregate value, and second in the spatial domain to investigate whether spatial interpolation can 
be applied. A formula by Taro Yamane (1) to estimate the minimum sample size of a population was used as 
the first criteria. The requirement to implement the formula with a 95% confidence level was assumed to be 
fulfilled. The number of no-cloud values at each pixel location in a period should be the same or greater than 
the number of samples calculated by the formula. No-cloud information can be retrieved from HCAI data, 
flagged as clear. This step results in an aggregate value on every qualified pixel. 
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N is the size of the population, n is the number of samples, and e is the margin of error. The margin of error 
is assumed to be 5% to obtain the required confidence level. Yamane’s formula has been used in remote 
sensing applications [18]. The formula was applied in the spatial domain, but in this research the formula will 
be implemented in the time domain on every pixel location. 

The area of interest was divided into eight regions as shown in Figure | to represent better spatial 
distribution. The area was divided by the same degree of longitude. The number of pixels in each area is 
1522 (R1), 4100 (R2), 3993 (R3), 2688 (R4), 3943 (R5), 4584 (R6), 3003 (R7), and 2128 (R8). A threshold 
value of 30% for each region was selected as the second criteria based on previous research on image 
restoration [19]. The accuracy of the restoration was still acceptable with 70% of the pixels removed. Other 
research shows that the use of 25% subsamples from sediment depth data results in low accuracy in the 
prediction of sediment volume [20]. Then the thresholds were calculated as 30% of the number of pixels in 
each area: 457 (R1), 1230 (R2), 1198 (R3), 807 (R4), 1183 (R5), 1376 (R6), 901 (R7), and 639 (R8). Spatial 
interpolation can be implemented to aggregate data. Inverse distance weighting (IDW) and Kriging 
interpolation are examples of widely used spatial interpolation methods. In some cases, the performance of 
the Kriging technique outperforms that of the IDW [21]-[23], so we will use Kriging interpolation in this 


research for evaluation. 
R1 R2 R3 R4 RS R6 R7 R8& 


Figure |. Java Island divided into eight regions 


For each region, the test was applied in several scenarios: daily, monthly, three-month period, and 
six-month period. The algorithm is the same for all scenarios, as shown in Figure 2. The threshold value in 
the time domain is not fixed based on the length of the period on an hourly basis. A successful region, as 
shown in the flow chart, means aggregation and spatial interpolated data have the potential to be generated 
over that region, while a failed region means the contrary. 


Pixels of HCAI 


Successful / Failed Region 


Time Domain | ' i | Spatial Domain 


Figure 2. Flow chart of the method 


The data ranges from July 2019 to June 2020. A seasonal grouping based on the Asian-Australian 
monsoon was used and consists of four periods: December-February, March-May, June-August, and 
September-November. To create a three-month and six-month period of data, it is assumed that data in June 
2020 can replace data in June 2019. Yearly variations in cloud coverage in Indonesia are relatively small [6]. 
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Evaluation includes comparison of the results between regions and visual inspection of the spatial 
distribution of the successful regions. Band 13 of Himawari-8 data is used to represent the dynamic 
parameter. Band 13 has been used in the algorithm for land surface temperature estimation [24], [25]. Land 
surface temperature products from Himawari-8 have not been published, therefore we use band 13 for 
analysis. The result was compared to skin temperature data retrieved from the ERAS database. Spatial 
correlation was calculated. Level-1 Himawari-8 data from December to May was used to retrieve spectral 
data at band 13. Additionally, some of the qualified pixels were used for validation of the spatial 
interpolation method. 


3. RESULTS AND DISCUSSION 
3.1. Scenario 1: Daily 

With 24 hours per day, the minimum amount of available data in one day as calculated by the (1) is 
23. It is defined as the first criteria. The number of pixels which passed the first criteria for each region was 
then applied to the threshold in the spatial domain as the second criteria. Table 1 shows the number of days 
every month which passed both criteria in each region. From December up to May, the number of available 
days is very small. It shows that the occurrence of clouds is very high in the wet season. The number of 
available days is still not complete in the remaining months. 


Table 1. Number of available days passed the criteria 
Month RI R2_ R3 R4_ RS _RO6_R7_ RS 


July 0 0 5 2, 6 4 5 3 
August 1 3 6 0 4 2 1 2 
September 2 2 5 1 6 4 3 4 
October 0 1 2 0 3 5 4 3 
November 1 0 0 0 0 0 1 1 
December 0 0 0 0 0 0 0 0 
January 0 0 0 0 0 0 0 0 
February 0 0 0 0 0 0 0 0 
March 0 0 0 0 0 0 0 0 
April 0 0 0 0 0 0 0 0 
May 0 0 0 0 0 0 1 0 
June 0 0 1 0 0 1 2 2 


3.2. Scenario 2: Monthly 

In a one-month period, the minimum amount of available data to pass the first criteria depends on 
the number of days in each month. Table 2 shows the number of qualified pixels in each region that passed 
the first criteria and may contain a monthly aggregate value. All regions passed the second criteria in July- 
November. In April, only region R8 passed the criteria. In May, there were three regions that passed the 
criteria, while in June there were seven regions. There is no region that passed the criteria for the full twelve 
months. 


Table 2. Number of qualified pixels by months 
Experiment _R1 R2 R3 R4 R5 R6 R7 R8 
July 1497 4084 3546 2266 3883 4318 2861 1875 
August 1507 4075 3359 «2218 )+=3891 »=— 4362S 2822 = 1868 
September 1519 4098 3502 2296 3888 4420 2867 1945 
October 1170 3083 ©3424 «1810 3716 4574 2976 2108 
November 1227 1512 2600 1030 2221 4116 2866 2067 


December 0 0 0 0 0 0 0 5 
January 0 0 0 0 0 0 0 103 

February 0 0 0 0 0 0 0 3 

March 0 0 0 0 0 0 0 5 
April 0 0 0 0 0 0 99 807 
May 0 0 0 45 320. 1254 759 913 
June 38 1429 2438 1777. 3594 4279 ~—-2893 «1909 


3.3. Scenario 3: Three-month period 

In a three-month period, the minimum amount of available data depends on the number of days in 
each three-month period. Table 3 shows the number of qualified pixels in each region that passed the first 
criteria and may contain an aggregate value over a three-month period. Only three regions (R6, R7, and R8) 
passed the criteria in all periods. Other regions failed to pass the criteria in the December-February period. 
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Table 3. Number of qualified pixels by three-month periods 


Period Rl R2 R3 R4 R5 R6 R7 R8 
June-August 1522 4100 3993 2687 3943 4581 3003 2126 
September-November 1522 4100 3990 2671 3943 4584 3003 2128 
December-February 0 1 5 14 117.1411): 1472-2014 
March-May 1362 3557. 3703-2506 = 3897 ~— 4562 ~— 2988 2123 


3.4. Scenario 4: Six-month period 

In a six-month period, the minimum amount of available data depends on the number of days in 
each six-month period. Table 4 shows the number of qualified pixels in each region that passed the first 
criteria and may contain an aggregate value over a six-month period. All regions passed the criteria, but 
regions R6, R7, and R8 had already passed the criteria in the previous scenario. The length of six months is 
chosen rather than four months or five months to obtain seasonal information. 


Table 4. Number of qualified pixels by six-month periods 

Period RI R2 R3 R4 R5 R6 R7 R8 
June-November 1522 4100 3993 2688 3943 4584 3003 2128 
December-May 1522 4098 3992 =. 2675 39434583 — 2998 2128 


3.5. Discussion 

The results of the daily and monthly scenarios were very far from fulfilling the criteria, especially in 
the wet season (December-May). Confidence in creating time series data arises in regions R6, R7, and R8 by 
applying aggregation and spatial interpolation in three-month periods. The number of qualified pixels in 
regions R6, R7, and R8 passed the criteria, although the number of qualified pixels in region R6 is very close 
to the threshold in the December-February period. Figure 3 shows the distribution of the pixels in December- 
February. Visually, it is shown that the distribution is not fully clustered. 


R1 R2 R3 R4 RS R6 R7 R8 


Figure 3. Distribution of qualified pixels in December-February 


To evaluate the spatial interpolation method, a dataset of average brightness temperature values is 
extracted from band 13 of the hourly Himawari-8 images in December-February over successful regions (R6, 
R7, and R8). The criteria in the time domain were applied to retrieve the average value. The mean value of 
the average brightness temperature from qualified pixels is 291.5 Kelvin. Spatial interpolation of the ordinary 
kriging method was applied using 30% of the qualified pixels, selected randomly. The remaining 70% of 
qualified pixels were used for evaluation. Interpolation resulted in a root mean square error (RMSE) value of 
0.3354. The bias was 0.0036, with error values ranging from -2.3256 to 2.6538. Figure 4 shows the spatial 
distribution of the error value. The RMSE value is low compared to the mean value of the average brightness 
temperature. The result boosts confidence in obtaining time-series aggregate data of dynamic parameters in 
regions R6, R7, and R8 over three-month periods using hourly Himawari-8 images. 

The spatial correlation coefficient was calculated between the proposed result and skin temperature 
in regions R6, R7, and R8 to evaluate the entire procedure, including aggregation and spatial interpolation. 
Brightness temperature data in December-February and March-May was calculated. Monthly skin 
temperature is used instead of three-months’ aggregate to answer the question of whether the longer 
aggregate data may represent the current condition. The calculation shows that the correlation is high as 
shown in Table 5 and Figure 5. It means we can use the proposed result in any application related to the skin 
temperature on the eastern part of Java. Further processing might be applied to have the dynamic parameter 
extracted from the interpolated brightness temperature values. 
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Figure 4. Distribution of error value with 70% of qualified pixels used for validation 


Table 5. Spatial correlation coefficient 
Aggregate of band 13 Skin temperature Correlation coefficient 


December-February February 0.703 
March-May May 0.801 
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Figure 5. Scatter diagram of interpolated data and skin temperature from ERAS 


Although the result shows good behavior, evaluation of data distribution in the time domain is still 
needed. Different weights for different periods of time due to data availability might be applied like in [26]. 
The spatial distribution pattern of data should also be clearly identified to improve the criteria in the spatial 
domain. 


4. CONCLUSION 

Full spatial resolution of aggregate data over the east part of Java could be obtained in shorter 
regular periods than over the west part of Java. Time series aggregate data could be potentially created by 
dividing the data into three-month periods using hourly Himawari-8 images, while over the western part of 
Java the length of period could be up to six months. The result of this study increases confidence in using 
optical remote sensing images from the Himawari-8 satellite to create time series data over a region. 
Comparison to the use of 10-minute HCAI data in future research might provide more robust results. For a 
more precise result, better criteria on data distribution patterns in both the time domain and the spatial 
domain should also be considered. 
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